Min Max Generalization for Two-stage Deterministic Batch Mode Reinforcement Learning: Relaxation Schemes

نویسندگان

  • Raphael Fonteneau
  • Damien Ernst
  • Bernard Boigelot
  • Quentin Louveaux
چکیده

We study the min max optimization problem introduced in [22] for computing policies for batch mode reinforcement learning in a deterministic setting. First, we show that this problem is NP-hard. In the twostage case, we provide two relaxation schemes. The first relaxation scheme works by dropping some constraints in order to obtain a problem that is solvable in polynomial time. The second relaxation scheme, based on a Lagrangian relaxation where all constraints are dualized, leads to a conic quadratic programming problem. We also theoretically prove and empirically illustrate that both relaxation schemes provide better results than those given in [22].

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Relaxation Schemes for Min Max Generalization in Deterministic Batch Mode Reinforcement Learning

We study the minmax optimization problem introduced in [6] for computing policies for batch mode reinforcement learning in a deterministic setting. This problem is NP-hard. We focus on the two-stage case for which we provide two relaxation schemes. The first relaxation scheme works by dropping some constraints in order to obtain a problem that is solvable in polynomial time. The second relaxati...

متن کامل

Min Max Generalization for Deterministic Batch Mode Reinforcement Learning: Relaxation Schemes

We study the minmax optimization problem introduced in Fonteneau et al. [Towards min max reinforcement learning, ICAART 2010, Springer, Heidelberg, 2011, pp. 61–77] for computing policies for batch mode reinforcement learning in a deterministic setting with fixed, finite time horizon. First, we show that the min part of this problem is NP-hard. We then provide two relaxation schemes. The first ...

متن کامل

Batch mode reinforcement learning based on the synthesis of artificial trajectories

In this paper, we consider the batch mode reinforcement learning setting, where the central problem is to learn from a sample of trajectories a policy that satisfies or optimizes a performance criterion. We focus on the continuous state space case for which usual resolution schemes rely on function approximators either to represent the underlying control problem or to represent its value functi...

متن کامل

Towards Min Max Generalization in Reinforcement Learning

In this paper, we introduce a min max approach for addressing the generalization problem in Reinforcement Learning. The min max approach works by determining a sequence of actions that maximizes the worst return that could possibly be obtained considering any dynamics and reward function compatible with the sample of trajectories and some prior knowledge on the environment. We consider the part...

متن کامل

Convex Batch Mode Active Sampling via alpha-relative Pearson Divergence

Active learning is a machine learning technique that trains a classifier after selecting a subset from an unlabeled dataset for labeling and using the selected data for training. Recently, batch mode active learning, which selects a batch of samples to label in parallel, has attracted a lot of attention. Its challenge lies in the choice of criteria used for guiding the search of the optimal bat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012